2 research outputs found

    An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

    Get PDF
    On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy. Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which can cluster data within a lesser computational time, especially for data streaming is needed. The presented adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization. At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority ofpresented approach comparing with the existing methods in terms of precision, recall, F-score, convergence, ROC curve and accuracy

    Socio-Transactional Impact of Recency, Frequency, and Monetary Features oN Customers’ Behaviour in Telecoms’ Churn Prediction

    No full text
    Due to the increasing competitiveness in telecom’s market, it has now become more necessary for operators to start building personal relationship with customers for targeted retention strategies. Achieving this goal requires the development of an effective churn prediction model that will solve the problem of churn misclassification, which is persistent in current churn prediction models. With several existing segment-oriented churn prediction models failing to harness the power of associative networking provided by telecoms users, churn prediction accuracy remains unguaranteed while targeted decision support is not enhanced. Here, the research introduced the Customer’s Influence Degree (I) to the existing Recency, Frequency, and Monetary (RFM) values as an additional predictive factor, towards determining the churn class of a customer. The essence is to utilise the socio-transactional affinities of customers’ direct dependent to targeted communication nodes through customers RFM analysis to determine the dominance of a customer in the community. The newly introduced predictive factor helped to minimise churn misclassification rate through appropriate reclassification of customers who were wrongly classified as churner or non-churner when using the existing RFM churn scores only
    corecore